Concord
Evaluating Long-Context Reasoning in LLM-Based WebAgents
Chung, Andy, Zhang, Yichi, Lin, Kaixiang, Rawal, Aditya, Gao, Qiaozi, Chai, Joyce
As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for providing personalized and contextually aware assistance. However, the performance of these agents in long context scenarios, particularly for action-taking WebAgents operating in realistic web environments, remains largely unexplored. This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents through sequentially dependent subtasks that require retrieval and application of information from extended interaction histories. We develop a novel evaluation framework that simulates multi-session user interactions by injecting irrelevant task trajectories between dependent subtasks, creating contexts ranging from 25,000 to 150,000 tokens. Through extensive evaluation of four popular models, Claude-3.7, GPT-4.1, Llama 4, and o4-mini, we observe a dramatic performance degradation as context length increases, with success rates dropping from 40-50\% in baseline conditions to less than 10\% in long context scenarios. Our detailed error analysis reveals that agents primarily fail due to getting stuck in loops and losing track of original task objectives. We further propose an implicit RAG approach that provides modest improvements by generating task-relevant summaries, though fundamental limitations in long context reasoning persist. These findings highlight critical challenges for deploying WebAgents in realistic, long-term user interaction scenarios and provide insights for developing more robust agent architectures capable of maintaining coherent task execution across extended contexts.
- North America > The Bahamas (0.14)
- North America > United States > New York (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (11 more...)
- Workflow (0.93)
- Research Report > New Finding (0.93)
- Media (1.00)
- Consumer Products & Services (1.00)
- Transportation (0.93)
- Leisure & Entertainment > Sports > Basketball (0.46)
PIFON-EPT: MR-Based Electrical Property Tomography Using Physics-Informed Fourier Networks
Yu, Xinling, Serrallés, José E. C., Giannakopoulos, Ilias I., Liu, Ziyue, Daniel, Luca, Lattanzi, Riccardo, Zhang, Zheng
We propose Physics-Informed Fourier Networks for Electrical Properties (EP) Tomography (PIFON-EPT), a novel deep learning-based method for EP reconstruction using noisy and/or incomplete magnetic resonance (MR) measurements. Our approach leverages the Helmholtz equation to constrain two networks, responsible for the denoising and completion of the transmit fields, and the estimation of the object's EP, respectively. We embed a random Fourier features mapping into our networks to enable efficient learning of high-frequency details encoded in the transmit fields. We demonstrated the efficacy of PIFON-EPT through several simulated experiments at 3 and 7 tesla (T) MR imaging, and showed that our method can reconstruct physically consistent EP and transmit fields. Specifically, when only $20\%$ of the noisy measured fields were used as inputs, PIFON-EPT reconstructed the EP of a phantom with $\leq 5\%$ error, and denoised and completed the measurements with $\leq 1\%$ error. Additionally, we adapted PIFON-EPT to solve the generalized Helmholtz equation that accounts for gradients of EP between inhomogeneities. This yielded improved results at interfaces between different materials without explicit knowledge of boundary conditions. PIFON-EPT is the first method that can simultaneously reconstruct EP and transmit fields from incomplete noisy MR measurements, providing new opportunities for EPT research.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Gur, Izzeddin, Furuta, Hiroki, Huang, Austin, Safdari, Mustafa, Matsuo, Yutaka, Eck, Douglas, Faust, Aleksandra
Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (22 more...)
- Leisure & Entertainment (0.46)
- Information Technology (0.46)
- Banking & Finance > Real Estate (0.33)
- Information Technology > Communications > Web (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Creative AI, FinOps among hot developer trends of 2023
A handful of important trends will transform the software developer experience in 2023, as enterprises consider more self-hosting, observe more SaaS consolidations and see an upswing of interest in creative AI. Also, as AI enters the creativity realm, it threatens to upend the future of app dev. And OpenAI's Chat GPT, released in November, takes code completion beyond line suggestions -- in addition to writing complete web pages and simple applications, it can generate new programming languages. For developers, the 2022 job market started strong, but by December, they saw storm clouds as layoffs hit the tech sector. Experts felt vibes of the early 2000s recession and the pandemic's early days.
High School Sophomore Arrested For Hacking Computer System, Changing Grades Of Other Students
A Northern California teen was arrested Wednesday for hacking a school district's computer system and changing the grades of up to 15 students. Authorities said they arrested David Rotaro, a sophomore at Ygnacio Valley High School in Concord, California, for infiltrating the school district's computer system. Rotaro, 16, said it was like "stealing candy from a baby," according to KGO-TV, an ABC affiliate in San Francisco. It took him five minutes to design a "phishing email," that he sent out to swipe login information from school faculty. Authorities didn't release Rotaro's name, however, he confessed to having committed the crime during an interview with KGO-TV.
- North America > United States > California > San Francisco County > San Francisco (0.27)
- North America > United States > California > Contra Costa County > Concord (0.27)
- Europe > France > Île-de-France > Paris > Paris (0.07)
Where are self-driving cars being tested?
An Arizona woman was killed after being struck by a self-driving Uber vehicle, an incident believed to be the first of its kind. But Uber is not the only company that has experienced accidents with driverless cars. Companies like Google, Tesla and General Motors also join the list. An Arizona woman was killed after being struck by a self-driving Uber vehicle this week - prompting the company to suspend all testing of self-driving vehicles in cities across the country. The Uber was in autonomous mode at the time of the collision in Tempe, and there was a vehicle operator behind the wheel, police said.
- North America > United States > Arizona (0.54)
- North America > United States > California > San Francisco County > San Francisco (0.09)
- North America > United States > Nevada > Clark County > Las Vegas (0.07)
- (5 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
Uber's Robo-Truck, McLaren's Senna Supercar, and More Cars News This Week
If the phrase "autonomous vehicle" makes you think of some four-wheeled pod tootling around the city, you need to think bigger. For all the talk of robo-taxis, the smart money says that when this tech comes for our roads, it'll start on the highway. And if you're looking for proof, grab your sunglasses, a trucker hat, and a ticket to Arizona or Florida--the testing grounds of choice for the companies teaching trucks to drive themselves. This week, we have news of Uber testing in the Copper State and startup Starsky Robotics sending a truck down a Florida highway, all by itself. Meanwhile, the titans of the auto industry met at the Geneva Motor Show, where the talk centered on supercars--and how to take down Elon Musk.
- North America > United States > Arizona (0.26)
- Europe > Switzerland (0.06)
- North America > United States > District of Columbia > Washington (0.05)
- (3 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks > Manufacturer (1.00)
- Government > Regional Government > North America Government > United States Government (0.31)
Real-Time Energy Disaggregation of a Distribution Feeder's Demand Using Online Learning
Ledva, Gregory S., Balzano, Laura, Mathieu, Johanna L.
Though distribution system operators have been adding more sensors to their networks, they still often lack an accurate real-time picture of the behavior of distributed energy resources such as demand responsive electric loads and residential solar generation. Such information could improve system reliability, economic efficiency, and environmental impact. Rather than installing additional, costly sensing and communication infrastructure to obtain additional real-time information, it may be possible to use existing sensing capabilities and leverage knowledge about the system to reduce the need for new infrastructure. In this paper, we disaggregate a distribution feeder's demand measurements into: 1) the demand of a population of air conditioners, and 2) the demand of the remaining loads connected to the feeder. We use an online learning algorithm, Dynamic Fixed Share (DFS), that uses the real-time distribution feeder measurements as well as models generated from historical building- and device-level data. We develop two implementations of the algorithm and conduct case studies using real demand data from households and commercial buildings to investigate the effectiveness of the algorithm. The case studies demonstrate that DFS can effectively perform online disaggregation and the choice and construction of models included in the algorithm affects its accuracy, which is comparable to that of a set of Kalman filters.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Hawaii (0.04)
- (5 more...)
- Energy > Power Industry (1.00)
- Energy > Renewable > Solar (0.88)
- Education > Educational Setting > Online (0.61)
WWII bombers once built on new Michigan driverless car test site
The ex-bomber plant and home of Rosie the Riveter will transform this year into an autonomous vehicle technology test site. It once housed one of the largest factories in the world, pumping out B24 bombers to help America and her allies win World War II, and later transmissions when it was owned by General Motors. It once housed one of the largest factories in the world, pumping out B24 bombers to help America and her allies win World War II, and later transmissions when it was owned by General Motors. The former Willow Run bomber plant in Ypsilanti Township is mostly a memory now, demolished following GM's 2009 bankruptcy, except for a piece that houses the Yankee Air Museum. Land at the former 335-acre Willow Run site in Ypsilanti Township where the American Center for Mobility is located on in January 2017 that will be used for testing autonomous vehicles.
- North America > United States > North Carolina (0.05)
- North America > United States > Iowa (0.05)
- North America > United States > California > San Diego County > San Diego (0.05)
- (9 more...)
- Transportation > Passenger (1.00)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- (4 more...)
Honda's Self-Driving Car Goes For A Test Run
Forbes allows marketers to connect directly with the Forbes audience by enabling them to create content – and participate in the conversation – on the Forbes digital publishing platform. Each is produced by the marketer. More on here, or contact us directly at brandvoice.com. Opinions expressed by Forbes Contributors are their own. On an old, decommissioned naval base in Concord, California, Honda showed off its latest advancements in autonomous car technology.
- Transportation > Passenger (0.84)
- Transportation > Ground > Road (0.84)
- Information Technology > Robotics & Automation (0.84)